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Abstract — Catalytic residue investigation is important for biologists to study protein functions. In previous studies, many 
researchers have successfully applied features, which were from sequence- and structure-based, to predict the position of 
catalytic residues in proteins. A highly correlation between atomic fluctuations and the catalytic positions have ever been 
obsen’ed. In this study, we were trying to investigate if this information was hidden in the X-ray diffraction data. The results 
from our test indicated the possibility to catch-up the catalytic residues in the protein structure refinement process. 
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I. Introduction 

To realize the function of a protein is the major goal for structural biologists to solve the structure. However, many protein 
structures from structural genomic project [1] are usually functional unknown. In order to decrease spent time and money, 
researchers usually need to pursue theoretical methods for identifying the potential functional sites. For this demand, many in 
silicon methods have been designed to achieve this requirement. These methods including sequence based such as, Llpred 
[2] and PINGU [3], and structural based such as, Jorge Fajardo approach [4] and WCN approach [5]. 

Recently, Huang et al., 2011 [5] provided a prediction method based on the dynamics nature of catalytic residues. They 
filtered out the candidate residues by ranking the crowdedness value, which they called weighted contact number (WCN), for 
each residue in a given protein. The occurrences for each amino acid type of catalytic residues have also been investigated. 
Through their approach, potential catalytic residues can be thus simply predicted from a single protein structure. The concept 
of Huang’s method is from Lin et al. 2008 [6], in which article they have mentioned the WCN model can be simplified to a 
centroid model (CM) [7], The CM has more than once been used as the alternative for the translation/libration/screw (TLS) 
model [8-13], which is frequently used in X-ray structural refinement process. Therefore, the application of the TLS model 
could be reasonably considered as a way to predict catalytic residues from a single structure. 

The major advantage by using TLS model is its university in structural biology community. If the prediction can be finished 
coupled with protein structural refinement, it would be very useful for structural biologists. Here, we showed that the 
prediction of catalytic residues in enzymes could be achieved directly from X-ray structural refinement process by using 
TLS-derived B-factors. 

We test our approach for a previous reported enzyme dataset [5], the results showed that the viability of our approach to 
locate the residues in proteins. 

II. Methodology 

2.1 The computed B -factor profiles based on TLS model 


The atomic fluctuation derived from the TLS model [8] is defined by Sternberg et al. [10] as: 




International Journal of Engineering Research & Science (IJOER) 


ISSN, [2395-6992] 


[Vol-2, Issue-6 June- 2016] 


Ary Ar ; ) = — tr^T + S T xn-nxS-nxLxn), 


(l) 


where r refers to the atomic displacement of an atom; T, L, and S indicate the translation, libration and screw matrixes, 

respectively. Position relative to the origin of an atom is donated by n. The translation matrix is built on the displacement 
correlations between translation vectors along three directions in the Cartesian coordinate system. The libration matrix 
contains the displacement correlations between rotation vectors about three Cartesian axes. Correlations between the 
translation and rotation vectors are used to build the screw matrix. Each of T, L, and S is 3 3 matrix, where T and L 

are symmetric matrixes, and S is usually with arbitrarily specified origin. In total, 10 TLS parameters are needed to be 
refined to obtain the required atomic fluctuation [10]. The computed B-factor based on Equation 1 will be referred as TLS- 
derived B-factor or B TLS through this article. 

2.2 The TLS parameter refinement 

To optimize the TLS parameters, the program REFMAC [14] of CCP4 software suit was used. The coordinates and structural 
factor were used as input data for REFMAC. In the parameters refinement process, the values of which were iteratively 
altered to find a best fitness against X-ray diffraction data [15-17], In this study, we set each protein chain as a TLS group 
and perform 10 cycles of the TLS refinement. The obtained TLS parameters were used for computing B TLS . 

2.3 Datasets of catalytic residues 

The initial dataset was selected from Huang, etc. [5]. This dataset contained 760 enzyme X-ray structures, which were 
selected from the Catalytic Site Atlas (CAS) [18], with pairing sequence identity 30%. We filtered out the structures 
without structural factors deposited and also those with errors in the SF files. Besides, only structures with all catalytic 
residues on the same protein chain were selected. The finally used dataset consists 371 enzyme structures (Table 1). We will 
use 371 -set to call this dataset through this article. 


III. Results and Discussion 

3.1 Characteristics of profiles between zB and zB TLS 

From the 371 -set, we observed that the catalytic residues frequently locate at the local minimum of the zB TLS profile. Figure 
1 shows profiles plotting for several selected examples, which are 1W10_A, 1PFQ_A, 1THG_A and 2PGD_A. From 
profiles shown in Figure 1, the catalytic residues, which are presented as hollow circles, almost located on the minimum 
regions. Therefore, it seems that the zB TLS profile could be used for easily predicting the catalytic residues. However, since 
the TLS-derived B-factor has been reported to have high correlation with the experimental B-factor [14, 19], the comparison 
of profile characteristics between zB TLS and zB is necessary in order to investigate if the zB TLS profile is more sensitive than 
the zB profile to detect catalytic residues. Figure 2 shows the profile comparisons between zB TLS and zB from the same 
PDBs used in Figure 1. From Figure 2, larger amplitudes of zB TLS profiles than zB profiles can be observed, and locations of 
catalytic residues in zB profiles have no obvious trends. 

3.2 Comparison of frequency distributions of amino acid types between catalytic and non-catalytic residues 

Figure 3 A illustrates the comparison of amino acid occurrences between catalytic and non-catalytic residues in 371 -set. In 
catalytic residues, ASP, HIS, GLU, and ARG obviously appear more frequently than which in non-catalytic residues. To 
represent the amino acid occurrences for catalytic residues more clearly, in Figure 3B, the occurrences for different amino 
acid types of catalytic residues in 37 1 -set are shown in an increasing order. The most frequently appearing amino acid types 
in 371-set are ASP, HIS, GLU, ARG and LYS that are accounted for 64% of the total amino acid types. In Figure 3B, we can 
clearly found that for the amino acid types accounted more than 5% are charged or polar amino acids. However, in non- 
catalytic residues, this preference is not existed (shown in Figure 3C). 
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Table 1 

The list of the PDB IDs of 371-dataset 


135L : A 

1BG0 : A 

1D60 : A 

1F48 : A 

1H7X : C 

1KDG: A 

10DT : C 

IQ JE : A 

1TMO : A 

1YBV : A 

1A0I : A 

1BGL : C 

1D8C : A 

1F6D : B 

1HQC : A 

1KIM : B 

10E8 : B 

1QK2 : B 

1TOX : A 

1YCF : A 

1A0J:C 

1BJO : A 

1DB3 : A 

1F8R : B 

1HRK: A 

1KNP : A 

10FD : A 

1QLH : A 

1TYF: I 

1Z9H : B 

1A26 : A 

1BQC : A 

1DBF : C 

1FC4 : B 

1HTO : B 

1KP2 : A 

10FG : F 

1QMH : B 

1TZ3 : A 

1ZE1 : A 

1A4I : A 

1BRM : B 

1DD8 :B 

1FCQ : A 

1HZF : A 

1 KRA : C 

10G1 : A 

1QQ5 : A 

1U7U : A 

1 ZM2 : F 

1A4L : A 

1BRW : B 

1DE6 : A 

1FDY : C 

1119 : A 

IKS J : A 

10H9 : A 

1QZ9 : A 

1U8V : C 

1 ZRZ : A 

1A65 : A 

1BT1 : A 

1DEK: A 

1FOA: A 

1 I IE : A 

1KWS : A 

10J4 : B 

1R16 : A 

1UAG: A 

206L:A 

1A8Q : A 

1BTL : A 

1DFO : B 

1FOB : A 

1111 : P 

1KYQ : B 

10K4 : H 

1R1 J : A 

1UAM: A 

2A0N : A 

1A95 : C 

1BWZ : A 

1DGS : B 

1FQ0 : A 

1129 : A 

1KYW : F 

10KG: A 

1R30 : A 

1UAQ : B 

2ACE : A 

1AB4 : A 

1BZC : A 

1DIO : L 

1FR2 : B 

1178 :B 

1KZH : A 

10NR:A 

1R4F : B 

1UAS :A 

2ADM:A 

1ABR: A 

1BZY : B 

1DMU : A 

1FR8 : A 

1I9A: A 

1L1D: A 

10S7 : B 

1R4 Z : A 

1UCH : A 

2AYH : A 

1AF7 : A 

1C2T : A 

1DNP : A 

1FRO : C 

1 IDJ : B 

1LCB : A 

10TG : C 

1R6W : A 

1UF7 : B 

2BHG: A 

1AFW : B 

1C3 J : A 

1DUP : A 

1FUA: A 

1IM5 : A 

1LCI : A 

10YG: A 

1R7 6 : A 

1UK7 : A 

2BIF : B 

1AGY : A 

1C82 : A 

1DZR: A 

1 F VA : B 

1IR3 : A 

1LJL : A 

1P4R:B 

1RA2 : A 

1ULA: A 

2BKR: A 

1AKD : A 

1C9U : B 

1E0C : A 

1G0D : A 

1ITQ : B 

1LML : A 

1P5D : X 

1RBL : A 

1UN1 : B 

2BX4 : A 

1AKM: A 

1CB8 : A 

1E1 9 : B 

1G64 : B 

1ITX: A 

1LNH : A 

1PFK: A 

1RHC : A 

1UQR: A 

2CPO : A 

1AKO : A 

1CF2 : Q 

1E2T : F 

1G6T : A 

1 IU4 : A 

1LVH : A 

1PFQ : B 

1RHS : A 

1UQT : B 

2 DOR : B 

1AL6 : A 

1CG2 : C 

1E5Q : E 

1G72 : A 

1J00 : A 

1M21 : B 

IPGS : A 

1RK2 : C 

1URO : A 

2ENG: A 

1AMP : A 

1CGK: A 

1E6E : A 

1G7 9 : A 

1 J09 : A 

1M6K: A 

1PIX : B 

1R07 : A 

1V04 : A 

2F61 : A 

1AMY : A 

1CHD : A 

1EB6 : A 

1G8F : A 

1 J4 9 : B 

1MLV : B 

1PJ5 : A 

1ROZ : A 

1V0E : B 

2HDH : A 

1APX : A 

1CHK : B 

1EBF : A 

1G8P : A 

1 J53 : A 

1MOQ : A 

1PJA: A 

1RPX : C 

1V0Y : A 

2 JCW : A 

1AQ2 : A 

1CK7 : A 

1EC9 : C 

1G99 : A 

1 J7 9 : B 

1MPX : C 

1PJH : A 

1RQL : B 

1V25 : B 

2LIP : A 

1ARZ : B 

1CNS : A 

1ECL : A 

IGA 8 : A 

1 J7G: A 

1MRQ : A 

1 PMI : A 

1RTU : A 

1W0H : A 

2NAC : A 

1AUG : D 

1COY : A 

1ECX : B 

1 GAL : A 

1 JCH : A 

1MVN : A 

1PS9 : A 

1RU4 : A 

1W10 : A 

2NLR: A 

1AUK: A 

1CTN : A 

1EEJ : A 

1GDH : B 

1 JH6 : A 

1N20 : A 

1 PWV : B 

1S3I : A 

1W2N : A 

2NPX : A 

1AUO : A 

1CTT : A 

1EF0 : A 

1GE7 : A 

1 JHF : A 

1NDI : A 

1 PXV : B 

1S95 : B 

1WD8 : A 

2PFL : A 

1AVQ : C 

1CV2 : A 

1EH5 : A 

1GIM: A 

1 JM6 : A 

1NID: A 

1PZ3 : B 

1S9C : B 

1WNW : C 

2PGD : A 

1AX4 : B 

1CVR: A 

1EHY : A 

1 GN S : A 

1 JMS : A 

1NIR: A 

1Q1 8 : B 

1SLL : A 

1X7D : A 

2PLC : A 

1B02 : A 

1CW0 : A 

1EI5 : A 

1GOG: A 

1 JOF : E 

1NML : A 

1Q3Q : C 

1SLM: A 

1X9H : A 

2SQC : A 

1B04 : A 

1CZ1 : A 

1ELQ : B 

1GP5 : A 

1 JQN : A 

1NVM: G 

1Q91 : A 

1SML : A 

1X9Y : B 

2THI : A 

1B57 : A 

1CZF : B 

1ESO : A 

1GQ8 : A 

1 JS4 : B 

1NVT : B 

1QBA: A 

1SNN : A 

1XGM : B 

2TOH : A 

1B5Q : B 

1D0S : A 

1EUG: A 

1GQG : B 

1 JXH : A 

1NWW : A 

1QCN : A 

1SZ J : R 

1XIK: A 

2TS1 : A 

1B6B : B 

1D1Q : B 

1EUY : A 

1GT7 : A 

1K0W : B 

1004 : E 

1QF6 : A 

1T0U : B 

1XQW : A 

2YPN : A 

1B7Y : A 

1D2R : E 

1EY2 : A 

1GUF : B 

1K30 : A 

1098 : A 

1QGX : A 

1T7D : A 

1XRS : B 

3R1R:A 

1B8F : A 

1D2T : A 

1EYP : A 

1GXS : C 

1K32 : A 

1 O AC : B 

1QH9 : A 

1TDJ : A 

1XVT : A 

5COX : D 

1B8G : B 

1D3G: A 

1EZ1 : A 

1GZ6 : A 

1KC7 : A 

10AS : A 

1QHG: A 

1THG: A 

1Y9M: A 

70DC : A 

1BF2 : A 

1D4A: B 

1F2V : A 

1H19 : A 

1KCZ : A 

10BA: A 

1QHO : A 

1TML : A 

1YBQ : A 

8TLN : E 

1BFD : A 











All protein chains in the 371-dataset are solved by X-ray and pair-wise sequence identity < 30%. Each of the 371-datset has 
been refined by TLS refinement with one TLS group per chain using REFMAC software program. 

3.3 Selection of the prediction thresholds 

To perform prediction, a threshold value is needed to be set first. Residues with zB TLS values below this threshold (or cutoff) 
are defined to be the candidate catalytic residues. The threshold value was determined by performing 10-fold cross-validation 
for the selected set consisting of 180 PDBs with the resolution < 2.0 A from the 371 -set. The best cutoff value is found to be 
-0.6 with the accuracy, the sensitivity and the specificity given 70%, 71% and 79%, respectively. 
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Figure 1. Prediction profiles based on TLS methods for several selected PDBs, which are 1W10_A, 

1PFQ_A, 1THG_A AND 2PGD_A SELECTED FROM THE 371-DATASET. FOR EACH CASE, THE CUT-OFF VALUE IS SHOWN IN 
DASHED LINE. HOLLOW CIRCLES INDICATES THE CATALYTIC RESIDUES. 
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Figure 2. Comparison of the profiles and the catalytic residue locations between TLS-derived B-factor 
(Btls) profiles and experimental B-factor profiles for 1W10_A, 1PFQ_A, 1THG_A and 2PGD_A. 
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Figure 3. Comparison of the amino-acid distributions between catalytic and non-catalytic residues. (A) 
The comparison of amino acid occurrences between catalytic and non-catalytic residues in 371-set. (B) 
The occurrences for different amino acid types of catalytic residues. (C) The amino-acid preference in 

NON-CATALYTIC RESIDUES. 
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3.4 Resolution effect on the performance for catalytic residue prediction based on B TLS 

Because the computation of B TLS includes the parameters fitting against X-ray diffraction data, the quality (i.e. the resolution) 
of the data would influence the calculated B- ns . To test the influences caused by the data quality, we changed the resolution 
cutoff, which were > 2.0A, < 1.9 A, < 1.8 A, < 1.7 A, < 1.6 A, < 1.5 A, to filter out the PDBs with resolutions larger than the 
cutoff from the 371 -set and to plot the receiver operating characteristics (ROC) curve for each resolution cutoff. The ROC 
curve is plotted by the true positive rate versus (TPR) the false positive rate (FPR). TPR is defined as the number of true 
positive predictions (i.e., correctly predicted catalytic residues) divided by the number of total positive predictions (i.e., all 
predicted catalytic residues), while FPR is defined as the number of false positive predictions (i.e., incorrectly predicted 
catalytic residues) divided by the number of total negative predictions (i.e., all predicted non-catalytic residues). Figure 4 
shows the comparison of ROC curves among different resolutions cutoffs. According to the ROC curves, the performance 
increased along the resolution cutoffs can be observed. Therefore, we re-selected a test set with resolution better than 2.0A 
from 371 -set as the training data. 



FPR 


Figure 4. ROC curves showing the prediction performances under different resolution cut-offs. 

3.5 Comparison with other methods 

Because our prediction method is based on atomic fluctuations, hence, to evaluate the performance, we compare the B TLS - 
profiling method with other atomic fluctuation based method. Tile now, to my best knowledge, the weighted contact number 
(WCN) model is the best one based on atomic fluctuations for prediction catalytic residues. In Figure 5, we compared the 
prediction accuracies calculated for each PDB based on TLS and WCN models for the 371 -set. The average accuracies are 
both around 0.70 under the cutoff setting -0.6. 
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Figure 5. Scatter plots of the computed fluctuations showing the comparison between TLS and WCN 

MODELS. 

IV. Conclusion 

In this study, we used TLS model to simulate the atomic fluctuations for investigating if the signature of catalytic residues is 
hidden in the X-ray diffraction data. Through several designed experiments, we found TLS -derived B-factors had ability to 
predict the positions of catalytic residues in proteins under a cut-off value -0.6. It’s not only reveal the viability for using TLS 
model for catalytic residue identification, but also a hint to show that the signatures of the catalytic residues might be hidden 
in X-ray diffraction data. Therefore, further studies are needed to clarify this point. 
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